Data mining when each data point is a network
نویسندگان
چکیده
We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and Karthikeyan Rajendran Department of Chemical & Biological Engineering, Princeton University Present Address: 731 Lexington Avenue, 10022, New York, NY, USA, email: [email protected] Assimakis Kattis Theory Group, Department of Computer Science, University of Toronto 10 King’s College Road, M5S 3G4, Toronto, ON, Canada, email: [email protected] Alexander Holiday Department of Chemical & Biological Engineering, Princeton University 41 Olden Street, 08544, Princeton, NJ, USA, email: [email protected] Risi Kondor Machine Learning Group, Computer Science & Statistics, University of Chicago Ryerson 257B, 1100 E. 58th Street, 60637, Chicago, IL, USA, email: [email protected] Ioannis G. Kevrekidis A319 Engineering Quad, Department of Chemical & Biological Engineering and Program in Applied & Computational Mathematics, Princeton University Technische Universität München Institute for Advanced Study; Zuse Institut Berlin 41 Olden Street, 08544, Princeton, NJ, USA, email: [email protected] 1 ar X iv :1 61 2. 02 90 8v 1 [ cs .S I] 9 D ec 2 01 6 2 Authors Suppressed Due to Excessive Length one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation. We further incorporate these approaches with equation free techniques, demonstrating how such data mining approaches can enhance scientific computation of network evolution dynamics.
منابع مشابه
Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملPerformance evaluation of chain saw machines for dimensional stones using feasibility of neural network models
Prediction of the production rate of the cutting dimensional stone process is crucial, especially when chain saw machines are used. The cutting dimensional rock process is generally a complex issue with numerous effective factors including variable and unreliable conditions of the rocks and cutting machines. The Group Method of Data Handling (GMDH) type of neural network and Radial Basis Functi...
متن کاملThe overall efficiency and projection point in network DEA
Data Envelopment Analysis (DEA) is one of the best methods for measuring the efficiency and productivity of Decision Making Units (DMU). Evaluating the efficiency of DMUs which have two or several stages by using the conventional DEA models, is equal to consider them as black box. This method, omits the effect of intermediate measure on efficiency. Therefore, just the first network inputs and t...
متن کاملEstimation of geochemical elements using a hybrid neural network-Gustafson-Kessel algorithm
Bearing in mind that lack of data is a common problem in the study of porphyry copper mining exploration, our goal was set to identify the hidden patterns within the data and to extend the information to the data-less areas. To do this, the combination of pattern recognition techniques has been used. In this work, multi-layer neural network was used to estimate the concentration of geochemical ...
متن کاملEfficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1612.02908 شماره
صفحات -
تاریخ انتشار 2016